Database Caching and Invalidation using Database Provided Facilities for Query Dependency Analysis

ABSTRACT

Database data is maintained reliably and invalidated based on actual changes to data in the database. Updates or changes to data are detected without parsing queries submitted to the database. The dependencies of a query can be determined by submitting a version of the received query to the database through a native facility provided by the database to analyze how query structures are processed. The caching system can access the results of the facility to determine the tables, rows, or other partitions of data a received query is dependent upon or modifies. An abstracted form of the query can be cached with an indication of the tables, rows, etc. that queries of that structure access or modify. The tables a write or update query modifies can be cached with a time of last modification. When a query is received for which the results are cached, the system can readily determine dependency information for the query, the last time the dependencies were modified, and compare this time with the time indicated for when the cached results were retrieved. By passing versions of write queries to the database, updates to the database can be detected.

PRIORITY CLAIM

This application claims the benefit of U.S. Provisional PatentApplication No. 60/684,610, filed May 25, 2005, entitled “TerracottaVirtualization Server”, and incorporated by reference herein in itsentirety.

CROSS-REFERENCE TO RELATED APPLICATIONS

The following applications are cross-referenced and incorporated byreference herein in their entirety:

U.S. patent application Ser. No. ______, filed concurrently, entitled“Database Caching and Invalidation Based on Detected Database Updates,”by Harward et al., filed concurrently (Attorney Docket No.TERA-01008US0); and

U.S. patent application Ser. No. ______, filed concurrently, entitled“Database Caching and Invalidation for Stored Procedures,” by Harward etal., filed concurrently (Attorney Docket No. TERA-01009US0).

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to databases and caching systemsfor databases.

2. Description of the Related Art

Databases are an integral part of many operations, businesses andorganizations where access to structured data is needed. Databases offermanageability for the vast amounts of data that many of today's largescale enterprises rely upon in day to day operations. Through the use ofdatabase storage, seemingly incomprehensible amounts of data can bemaintained in a structured environment so that requesting users caneasily locate and retrieve needed information. Many of today'slarge-scale applications such as e-commerce, banking, etc. are able topresent end-users with seemingly instant access to personalizedinformation by using databases. Without database storage in many ofthese environments, the task of methodically sorting through the amountof information necessary to access that which is needed would likelyfrustrate the ultimate purpose of the application to the point of endingmany operations.

Databases can be implemented in many languages and may support manyprotocols such as Structured Query Language (SQL) which provides usersand applications the ability to interface with the database through SQLstatements to select, insert, update, and delete information in thedatabase. Of course, the complexity of operations performed by andsubmitted to databases continues to expand. SQL statements can becombined to provide a high level of logic and interaction with data in adatabase.

As the amount of data maintained by databases continues to expand, theefficiency and performance of database access and response becomes evermore crucial. For instance, a database maintaining banking records forusers across the country or world must be able to efficiently respond torequests from numerous users accessing the database at the same timefrom various remote geographic locations.

Database caching techniques have been developed to decrease the amountof aggregate traffic transmitted between the database and requestingclient devices. By caching database information in smaller fastermemories, performance can be increased. Certain access requests will besatisfied from local cached versions of the database. This will not onlyincrease the performance as seen by requesting applications or entities,but also decrease the amount of load placed on the database.

An inherent difficulty with database caching is the structured nature ofboth the data storage and retrieval. In traditional data cachingscenarios, such as caching recently accessed data from a large memoryvolume that is indexed according to a physical or logical address,discrete data segments can be cached in a smaller memory and referencedaccording to their location on the large memory volume. The cache simplylooks for an update to the location on the memory volume correspondingto the cached data and when the data at that location changes, the cachecan simply invalidate or discard its cached results. In databases,however, data is retrieved by passing queries such as SQL statements.These statements can access, combine, and manipulate data from differentlocations to satisfy the query request.

Because of the structured nature of databases, database caches relyingon these traditional caching techniques are required to store a subsetof an entire database's data structures. Data is cached based on memorylocations and entries invalidated based on updates to those memorylocations. These systems do not cache results based on an actual query.Because queries may combine or access data from various locations in thedatabase, and request data based on logic that is dependent upon thedata stored therein, it is difficult to accurately determine when thepreviously retrieved results for a query are no longer valid because ofchanges to data in the database. Complex analysis to parse a receivedquery is thus required to determine what the query does at the database.While in theory such a technique is useful, in practice it presentsobstacles. Databases are typically written in a proprietary format andthe underlying source code is not made available to those wishing toimplement a caching system. Because an intimate knowledge of thedatabase's internal operations and code can not be had, parsing thequeries is in many instances deficient or even wrong. Moreover,individual database providers frequently alter their databases,requiring these parsing techniques to continually be updated as thedatabase is updated.

In other implementations, schedule or time-based caching has been usedfor databases to overcome the difficulty with parsing queries. Entriesin a cache are made for particular queries and the results of thosequeries. However, the entries are automatically invalidated after aspecified amount of time without any regard to actual changes at thedatabase. If changes to the database are made between invalidationperiods and a request is received, the cache may return invalid results.Additionally, this technique can degrade performance and efficiency byneedlessly invalidating accurate query results.

Accordingly, an improved system and technique for database caching isneeded to provide efficient, reliable, and accurate results.

SUMMARY OF THE INVENTION

A database caching system and related techniques are provided that canreliably maintain and invalidate database data based on actual changesto data in the database. Updates or changes to data at the database aredetected without parsing queries submitted to the database. Thedependencies of a received query can be determined by submitting aversion of the received query to the database through a native facilityprovided by the database itself to analyze how query structures areprocessed at the database. The caching system can access the results ofthe facility to determine the tables or rows (or other data partition) areceived query is dependent upon or modifies. In addition to the resultsof a query that can be cached with an indication of the query itself, anabstracted form of the query can be cached with an indication of thetables, rows, etc. that queries of that structure access or modify. Thetables a write or update query modifies can be cached with a time oftheir last modification. When a query is received for which the resultsare cached, the system can readily determine dependency information forthe query, the last time the dependencies were modified, and comparethis time with the time indicated for when the cached results wereretrieved. By passing versions of write queries to the database, updatesto the database can be detected. In one embodiment, a component isimplemented at or on the system of the database to directly detectchanges to the database data. This component can monitor transactionalinformation maintained by the database itself to determine when changesto the database occur. This component can communicate with the cache toprovide notification of changes to the database.

A method of caching database query results is provided in one embodimentthat includes receiving a request for information from a database anddetermining if previously received results for the request aremaintained locally. If previously received results are not maintainedlocally for the request, the method can include passing a form of therequest to the database by calling a native analysis facility at thedatabase to assess how the database processes requests of the form,accessing results of the assessment by the native analysis facility,determining from the results one or more dependencies of the databasefor requests of the form, maintaining data indicating that requests ofthe form for the database include the one or more dependencies, passingthe request to the database and receiving results, and maintaininginformation including the request and the results of the request.

In one embodiment, a database caching system is provided that includes adatabase including a native analysis facility to analyze querystructures and a caching intermediary in communication with the databaseand an application accessing the database. The caching intermediarymaintains one or more queries and results received from the database inresponse to the one or more queries. The caching intermediary receives afirst query from the application and determines if the first query andpreviously received results for the first query are maintained by thecaching intermediary. The caching intermediary passes a form of thefirst query to the native analysis facility at the database if thecaching intermediary is not maintaining results for the first query andaccesses results of the analysis facility to determine one or moretables of the database upon which queries of the form depend. Thecaching intermediary further maintains the form of the first query andan indication of the one of more tables of the database upon whichqueries of the form depend.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a database caching system in accordancewith one embodiment.

FIG. 2 depicts a database cache in accordance with one embodiment.

FIG. 3 is a block diagram of a database caching system in accordancewith one embodiment, wherein processing and caching database queries isdepicted.

FIG. 4 is a flowchart for processing and caching database queries inaccordance with one embodiment.

FIG. 5 is a block diagram of a database caching system in accordancewith one embodiment, wherein the detection of a backend update to thedatabase, and subsequent update to the cache is depicted.

FIG. 6 is a flowchart for detecting updates to a database and updating adatabase cache according to the update in accordance with oneembodiment.

FIG. 7 is a flowchart for monitoring updates to a database using alistener component implemented as a debugger for at least a portion ofthe database.

FIG. 8 is a block diagram of a database caching system in accordancewith one embodiment, wherein processing and caching of database storedprocedures is depicted.

FIG. 9 is a flowchart for processing and caching database storedprocedures in accordance with one embodiment.

FIG. 10 is a flowchart for caching database queries and invalidating theresults of the database queries based on row level updates to thedatabase.

FIG. 11 is a block diagram of a multi-level database caching system inaccordance with one embodiment.

DETAILED DESCRIPTION

Database Caching Intermediary

FIG. 1 is a block diagram depicting one embodiment of the technologydescribed herein executing in an application server—database serverenvironment. FIG. 1 depicts an application server 102 and databaseserver 122. A variety of applications 104 can be hosted and executedupon application server 102. Applications 104 can be implemented in anysuitable language, including but not limited to the Java™ programminglanguage, C++, etc. The application servers on which applications 104are executed will vary by embodiment, but can support such platforms asthe Java™ 2 Platform, Enterprise Edition (J2EE) Platform.

In one embodiment, servers, processing systems, and other processorbased devices as described can include, by way of non-limiting example,one or more processors capable of executing instructions provided onprocessor-readable media to perform the methods and tasks describedherein. The processing system may include a volatile memory, a massstorage device or other non-volatile memory, a portable storage device,one or more communications or network interfaces and various I/Odevices. A processing system hardware architecture as described is justone suitable example of a processing system suitable for implementingthe present technology. While embodiments are presented with respect toexemplary server-based implementations, variations for stand-alone,client, and other suitable processing environments can be made withoutdeparting from the scope of the present disclosure.

The database server 122 hosts a database application 120. Databaseserver 122 includes an operating system 124 and a non-volatile storagedevice 126 (such as a disk drive or RAID array). Numerous types ofdatabases 120 can be used in accordance with embodiments of thedisclosed technology. For example, database 120 can include a databaseprovided by Oracle™ Corporation of Sunnyvale, Calif., a databaseprovided by Sybase™ Corporation of Dublin, Calif., or a databaseprovided by Microsoft™ Corporation of Redmond, Wash. The technologydescribed herein is not limited to any particular type of databaseapplication. References to particular types and brands of databases andthe particular functions of a type of database are presented forexemplary purposes and are not intended to limit the scope of thepresent disclosure or claimed subject matter. The extension of anembodied concept from an exemplary described database to a differenttype of database will be readily apparent to those of ordinary skill inthe art. It is further noted that the disclosed server basedimplementations are exemplary and other implementations can be made onclient devices hosting the database and/or caching system, for example.Moreover, multiple levels of caching in accordance with embodiments canbe provided at one or more servers in addition to those indicated inFIG. 1. For example, a caching system implemented at a central servercould interface between the components of database server 122 and thoseof multiple application servers. Cached results can be maintained andretrieved from the central server in addition to or in place ofmaintaining and receiving them at an intermediary cache at theapplication server level.

Interface 106 is provided at application server 102 to enableapplications 104 access to database 120. Interface 106 will vary byimplementation according to the type of programming platform and/ordatabase provided. Interface 106 is provided in accordance with anApplication Programming Interface (API) that provides connectivitybetween the programming language(s) and platforms associated withapplications 104 and database 120. In many modern databaseimplementations, interface 106 is implemented in accordance with theJava™ Database Connectivity (JDBC) API, an industry standard interfacethat provides database independent connectivity between the Java™programming language and a wide range of databases. Interface 106 is aJDBC driver in one embodiment, implementing the JDBC API to provide acall level API for SQL-based database access. The JDBC driver canprovide transparent access for applications 104 to database 120.

A database caching intermediary 108 is provided at application server102 to increase the performance, reliability and availability of theinformation maintained by database 120 for requesting applications 104.Cache 108 can provide previously received results for queries or otherrequests/calls submitted to database 120 that are maintained locally atapplication server 102 to increase the performance and availability ofdatabase 120 while decreasing the response time for requests.Applications 104 request data from database 120 by submitting queries orother calls. For example, applications can submit commands formatted inaccordance with the specifications of interface 106 (e.g., as JDBCcommands) to provide SQL commands to database 120. Intermediary 108 canmore quickly provide the results of a query if valid results arecurrently stored in the cache. Moreover, intermediary 108 can increasedatabase availability by providing cached results when database 120 isinaccessible.

The structure of caching intermediary 108 may vary by implementationaccording to the requirements of a particular application. In theembodiment depicted in FIG. 1, caching intermediary 108 includes anintermediary interface or driver 112 and a cache 110. Interface 112 canprovide and execute many of the disclosed operations while cache 110maintains portions of the data from database 120 and internal data usedby the caching intermediary 108 that can maintain portions of database120 in local memory. Interface 112 conceptually wraps itself aroundnative interface 106 to provide transparent intermediation betweenapplications 104 and database 120. Applications 104 can communicate withinterface 112 in the same manner they would with the native databaseinterface. To requesting applications 104, interface 112 will appear thesame as the native interface 106. By wrapping around the nativeinterface, the intermediary interface forces applications 104 tocommunicate with the intermediary, instead of directly accessinginterface 106.

In one embodiment, interface 112 is a HA-JDBC High Availability Java™Database Connectivity (HA-JDBC) driver. Interface 112 will appear as astandard JDBC driver and thus provide a transparent solution toapplications 104 for caching query results. Intermediary 108 can beinstalled and integrated into an existing system without modifying anyapplication source or object code and without modifying any databasesource or object code. Applications 104 continue to access database 120in the typical manner, appearing to interface directly with the nativedatabase driver 106, while in fact, interfacing with the intermediarydriver 112 provided in accordance with one embodiment.

Caching intermediary 108 provides cached results that are maintainedcurrent and invalidated where appropriate based on updates to database120 or events processed by database 120. Changed data within thedatabase is the impetus for determining that cached database results areno longer valid. This contrasts with traditional database caches thatoperate as scheduled or time-based caches. For example, traditionalschedule-based caches invalidate cached results based on a set scheduleor time interval. Entries stored in the cache expire or are invalidatedperiodically based solely on an elapsed period of time. Invalidationsoccur in response to the mere passage of time without regard to anyactual changes at the database. By using such a technique, these systemsstand not only to return invalid data to requesting applications, butalso to needlessly invalidate valid cached data and decreaseperformance.

Depending Analysis Using Native Database Facilities

Intermediary 108 can detect actual changes to database 120 andinvalidate and/or update entries in cache 110 based on these updatesrather than an elapsed period of time according to a predeterminedschedule. In one embodiment, changes to database 120 are detectedthrough queries submitted by applications 104 and received at thecaching intermediary. In one embodiment, caching intermediary 108 isable to detect changes to database 120 made in response to querieswithout parsing the submitted queries. In one embodiment, the cachingintermediary 108 calls a native analysis facility provided by andinherent to the database to analyze queries received from requestingapplications. The analysis facility provides information regarding how aquery submitted for analysis is processed by the database. Cachingintermediary 108 can determine details of how the database processes thequery, including the tables or other data partitions the databaseaccesses to fulfill the requests.

It is important to note that the analysis facility called by cachingintermediary 108 is a native facility or process provided by thedatabase. This enables the intermediary to determine which tables of adatabase are accessed in responding to a query. Importantly, theintermediary is able to determine this information without parsing theactual query. Databases typically provide debugging and performanceanalysis facilities as native database procedures available toend-users. These facilities are traditionally used by databaseadministrators to debug queries or to increase the performanceassociated with a particular query. An administrator may view how aquery is processed and modify the query structure based on this reviewto maximize performance. Caching intermediary 108 can exploit one ormore of these native facilities to determine the dependencies of queriesit receives.

The use of native database facilities to determine this information is astrong departure from traditional caching techniques that rely onparsing queries. For example, intermediary 108 may call an “EXPLAINPLAN” facility to determine the dependencies of a query submitted to anOracle™ database. An abstracted or skeleton version of the query can beissued in a call to this facility to determine how such a query isprocessed and this, the tables accessed. Numerous types of nativedatabase facilities may be used in various embodiments depending on thetype of database platform, programming language, or server platform.

The use of native database facilities to determine dependency and otherinformation for a query is a significant and strong departure fromtraditional caching techniques that rely on query parsing, for example.To provide cached results based on changes to data in a database,previous techniques have relied on parsing the actual query receivedfrom an application to determine query dependencies, etc. These systemsinclude an inherent limitation that it is difficult if not impossible toprecisely parse a query without knowledge of the relevant databasesource code. Any cache provided by an entity without intimate knowledgeof the database code will not be able to fully understand and parsequeries designed for that database. The technology described hereinavoids these inherent limitations by relying on the database itself toassess and provide the necessary information for caching to theintermediary.

FIG. 2 depicts an exemplary implementation of intermediary cache 110 inaccordance with one embodiment. In FIG. 2, cache 110 includes threeindividual tables to facilitate caching and cache invalidation. Aresults table 204 maintains a list 210 of queries and the results 212 ofthose queries as retrieved from the database. Table 204 further includesan indication 214 of the time at which the query was processed and theresults obtained from the database. In one embodiment, the results aretime-stamped and the third column is not used.

Dependency table 206 maintains a list of query structures, also referredto as skeletons or footprints, along with a list of the tables fromdatabase 120 upon which queries of that structure are dependent. A queryskeleton is a form of an actual query without any variable data orliteral expressions. Literal expressions refer to the individualvariables within a query for which input data is passed to the databaseto retrieve unique records. For example, a simple query (shown as entry216 in table 204) that selects a record from an employee table where theuser_id column is 12 will have a skeleton as shown by entry 222 in table206. The skeleton shown is a form of the original query with the user_idliteral expression “12” replaced with “:v.” All queries having the sameskeleton will be processed in a similar manner and retrieve results fromthe same tables in the database. Any query having the structure of entry222 is dependent upon the employee table. Replacing the literalexpressions with “:v” is but one example of abstracting a query. Anysuitable place holder can be used and in one embodiment, the literalsections are simply “marked off” using some external method (e.g.,recording the indices of the start and end characters of each literalsection). Modification table 208 tracks modifications or updates to thetables in database 120. Table 208 lists table from database 102 and thetime at which the table was last updated. Using the tables of cache 110,the caching intermediary can determine if valid results are cached for aparticular query. If results are listed in table 204 for a receivedquery, table 206 is accessed to determine what tables the query isdependent upon. Table 208 is then accessed to determine when thosetables were last modified. If the time of the last modification isbefore the time the results were obtained (table 204), the results aredetermined to be valid.

A listener component 128 is also provided to detect changes to datamaintained by database 120. This component provides the additionalbenefit of being able to detect so-called backend updates to database120 that do not pass through intermediary 108. For example, database 120can be modified by a database administrator or other entity not incommunication with intermediary 108. Listener component 128 caninterface directly with the database at the database server to detectany backdoor updates that originate from beyond the reach or control ofthe caching intermediary.

In accordance with one embodiment, invalidations of cached databaseresults are based on actual changes to database data. Updates to thedatabase trigger the invalidation of cache entries within a cachingintermediary. A more efficient cache is provided by avoiding unnecessaryinvalidations of cached results, as inherent in time-based caches.Furthermore, invalidations based on changed data as determined from thedatabase provides consistency between data in the cache and data in thecache and database. Moreover, the reliance on the database to analyzequeries and determine dependency information avoids the limitationsassociated with parsing queries submitted to a database for which theunderlying code and operation is not well understood.

FIG. 3 is a simplified block diagram depicting the communication flow ofa database caching system in accordance with one embodiment. Application104 (e.g., Application A) at application server 102 communicates withcaching intermediary 108 to access database 120, implemented at databaseserver 122. Application 104 issues query 140 for execution by database120. Query 140 is received at caching intermediary 108. When query 140is received at the intermediary, the interface 112 accesses intermediarycache 110 to determine if the intermediary is maintaining valid resultsfor query 140, retrieved from the database in response to a previousrequest including query 140.

Interface 112 accesses cache 110 as indicated at 141 and determineswhether results for query 140 are currently maintained locally at thecache. If the results are currently stored—in results table 204 forexample—the intermediary accesses a dependency table 206 at cache 110 todetermine which tables at the database queries of the received query'sstructure are dependent upon. Table 208 is then accessed to determinethe last time each table query 140 is dependent upon was last updated.The time the results stored in results table 204 were obtained iscompared with the last time each dependent table was updated todetermine whether the cached results were retrieved from the databaseafter the last update to any dependent table of the query. If theresults were retrieved after the last update the cached results arereturned as indicated at 138.

If the results were retrieved before the last update or if no resultsare maintained, the database is accessed to process query 140. Interface112 abstracts the query by removing the literal expressions from thequery string to create a query skeleton 142 having the structure ofquery 140. After abstracting the query, intermediary interface 112passes the skeleton 142 to the native database interface which forwardsit to database 120. In one embodiment, interface 112 checks cache 110first to determine if the dependency analysis information for theabstracted query is already stored. If so, the abstracted query is notpassed.

If the query structure and dependencies of query 140 are not alreadystored, interface 112 passes query structure 142 by calling a nativeanalysis or tuning facility 162 provided by database 120. The termnative indicates that the facility or process called at the database isa process provided by and originating from the database itself, notcaching intermediary 108. Databases typically provide variousfunctionality for performance analysis, tuning, and debugging, etc.Debugging facilities can provide information to database administrators(DBAs) and developers about how a query will be processed by thedatabase. The facilities may analyze a query structure and specify whichtables are accessed to process queries of that structure, how or in whatorder the tables are accessed, how the data from the tables arecombined, and other types of information about how the databaseprocesses a query having the structure that was passed when calling theanalysis facility. These facilities provide information to DBAs, etc. sothat queries can be restructured or modified to optimize processing atthe database. In one embodiment for example, an Oracle™ databaseprovides a debugging facility commonly referred to as “EXPLAIN PLAN.” Inresponse to a call to the “EXPLAIN PLAN” facility passing a selectedquery structure for analysis, the database will analyze how the databaseprocesses that particular query structure.

When database 120 receives the abstracted query skeleton 142, it willexecute the facility called by the intermediary when passing theskeleton. After analyzing skeleton 142 to determine how queries havingthe particular structure are processed, intermediary interface 112accesses the results in order to determine the tables upon which queriesof structure 142 are dependent. Different types of databases make theseresults available in different formats. For example, database 120 maycreate a table, file, or other data structure 150 with the results ofits analysis and store it locally on disk 126, for example. Interface112 can access the results from disk to determine the dependencies forquery structure 142. An exemplary communication 144 from intermediary112 to listener 128 instructing the listener to read results 150 fromdisk 126 is depicted in FIG. 3. After reading the data from disk 126,listener 128 provides the requested information to interface 112 asdepicted by reply 145. In other embodiments, a database may provide thedata from an analysis by such a facility directly to the requestingapplication such as interface 112.

Numerous types of analysis facilities are provided by databases that canbe accessed in accordance with embodiments to determine querystructures' dependency information. Embodiments of the technologydescribed herein are not limited to any particular analysis facilityprovided by any particular database. It is important to note that acache in accordance with one embodiment can call a native databaseanalysis facility without having any predetermined knowledge of thequery or the database protocols to be able to parse the query itself.This enables the cache to be implemented transparently to bothrequesting applications and databases. Importantly, the cache does notparse the query received from the requesting application. This enablesthe cache to interface with databases having different protocols,platforms, programming languages, and interfaces. Because the systemcalls a native facility at the database, it does not need to fullycomprehend or be able to parse the queries it receives. The nativedatabase facility is relied upon for the required dependency analysis.

It is also important to note that these native analysis facilities suchas “EXPLAIN PLAN” are not intended or designed for use in cachingoperations as presently described. These facilities are typicallyprovided for database administrators to manually issue queries to thedatabase to analyze the performance of the database when executing aparticular query. These tools are valuable to database administratorsbecause they can design and restructure queries to maximize efficiencyand performance by the database. By taking advantage of such facilitiesfor caching query results, the complicated task of parsing queries isavoided while improving performance be reducing or eliminating parsingerrors. Because of their intended application, these analysis facilitiestypically provide large amounts of information in addition to indicatingthe tables the database accesses in response to a particular querystructure. In one embodiment, listener 128 filters the informationreceived from the database in response to the query analysis todetermine the tables accessed when executing the query.

After receiving dependency information, intermediary 108 passes theactual query 140 to database 120. Database 120 executes the query andreturns results 148. If intermediary 108 determines that query 140 is aread query (as can be determined from the query structure analysis), anew entry is created in a dependency table listing skeleton 142 (if anentry does not already exist) and the dependencies determined by thedatabase analysis facility. Intermediary 108 also creates an entry inresults table 204 for query 140, specifying the query 140, the resultsof the query provided by database 120, and the time at which the resultswere obtained. FIG. 3 conceptually depicts an update 143 issued frominterface 112 to cache 110 specifying the data to be stored in theresults and dependency tables. Processing of the query is complete whenresults 148 are provided to application 104. It should be noted that thequery and skeleton can be passed to the database in any order and thecache may determine the dependency information after receiving theactual query results.

If intermediary 108 determines that query 140 is a write query, itupdates or creates an entry in a modification table listing the tableseffected by such a write query and the time at which the write querymodified the tables. For example, if the write query modifies anemployee table, and an entry already exists in modification table 208for this table, the time at which that table is indicated to have lastbeen modified is updated to reflect the new modification time. Afterupdating or creating entries in the modification table, an indication isreturned to application 104 that the query has been successfullyprocessed.

In one embodiment, an entry is included in dependency table 206 forwrite query structures and the tables modified by those write querystructures. A database analyzing a write query structure will provideinformation regarding the tables updated or modified by that querystructure. When a query is subsequently received, the dependency tablecan be accessed to determine if dependency information for that query'sstructure is already stored. If so, the actual query can be submitted tothe database without passing the query structure to the database todetermine the dependent tables. When the results are received, themodification table can be directly updated. In one embodiment, thedependency table indicates whether a query is a write or read query andthus, whether the listed tables are modified by the query or whether thequery is dependent upon those tables.

In an alternate embodiment, entries in table 204 affected by a writequery are directly invalidated instead of updating times of lastmodification in table 208 that will be used later to determine if cachedresults are still valid. For example, each entry from table 204 can beread and the results of any queries that are dependent upon an updatedtable immediately invalidated. In one embodiment, the tables each querylisted in the results table 204 is dependent upon is determined fromdependency table 206. Intermediary 108 invalidates any results from theresults table that depend upon an updated table. While such embodimentsare possible, efficiency may decrease if numerous entries are listed intable 204. By maintaining a list of the last update times for eachtable, better efficiency can be achieved by invalidating results onlywhen a subsequent query corresponding to those results is received toavoid reading every entry from table 204 in response to any update tothe database.

FIG. 4 is a flowchart of a method for processing the database queriesand caching results thereof in accordance with one embodiment. A cachingsystem, such as intermediary 108 executing at application server 102,receives a query from an application at step 302. A results table isaccessed at step 304 to determine whether an entry exists for thereceived query. If results are not being maintained for the receivedquery, the query is abstracted at step 306 by removing any literalexpressions or variable data.

After abstracting the query, a dependency table maintained by thecaching system is accessed at step 308 to determine if an entry existsfor the structure of the abstracted query. If an entry exists, thetables the query is dependent upon (read query) or the tables the querymodifies (write query) are determined at step 310. After determining thetables at step 310, the unabstracted query is passed to the database atstep 312 for processing. After processing the query, the databasereturns the results to the cache.

If the abstracted query structure is not stored in the dependency table,the abstracted query skeleton is passed to the database at step 314 bycalling a native analysis facility provided by the database. The queryis passed onto the database and the results received at step 316. Theresults of the abstracted query analysis by the database are accessed atstep 318. From the results accessed at step 318 or step 310, the cachedetermines whether the query was a write or read query at step 320. Ifthe query is a read query, an entry is created in the dependency tableat step 322 setting forth the query structure and the table(s) thestructure is dependent upon. If an entry already exists as determined atstep 308, step 322 is skipped. An entry is created in results table 204at step 324 listing the actual query and the results received from thedatabase upon passing the query at step 312 or 316. These results arestored with an indication of the time they were received to facilitatesubsequent data validity determinations. The results of the query arereturned to the requesting application at step 326.

Returning to step 320, if the received query is determined to be a writequery, an entry is created in the dependency table at step 328 settingforth the query structure and the table(s) queries of that structuremodify. Any entries in modification table 208 that correspond to tablesaffected by the received query are updated at step 330 to reflect thetime of the latest update by the query. If no entry or entries exists intable 208 for the table(s) updated by the received query, new entriesare created. After updating and/or creating the entries at step 330, aresponse is returned to the requesting application at step 332indicating that the query has been successfully processed.

If it is determined at step 304 that an entry in results table 204 forthe received query already exists, table 206 is accessed at step 334 todetermine what tables the query is dependent upon after abstracting thequery. After determining the dependent tables, modification table 208 isaccessed at step 336 to determine the last time each dependent table waslast updated. If any dependent table was updated after the time ofretrieval for the query results as reflected in the time stamp at theresults table, the results are determined to be invalid at step 338. Theactual query is then passed to the database at step 340 and the resultsof the query received. Results table 204 is updated at step 342 toreflect the new results for the query that were received from thedatabase. The cache also adjusts the time for the query in the resultstable to reflect the time the new results were retrieved.

The results of the query are returned to the requesting application atstep 344 from the cache after determining that the query results arevalid at step 336. Otherwise, the results received at step 338 from thedatabase are returned at step 344.

Database Listener

FIG. 2 illustrates the detection of updates to a database directly by acaching intermediary in response to a received query and structuralanalysis provided by the database for the query. It is also possible forupdates to a database to be made without a caching intermediarydetecting them. For example, an application accessing the database froma location outside of application server 102 may send queries that donot pass through the caching intermediary. Other applicationsimplemented on different application servers with their owncommunication links to database server 122 may send updates that do notpass through the intermediary. Database administrators or other usersmay directly modify the database through an administration interfaceprovided by the database. Such updates may be referred to generally asbackend updates.

FIG. 5 illustrates a backend update 166 to database 120 originating froman entity beyond application server 102. Update 166 is received directlyat database server 120 and does not pass through the cachingintermediary 108 at application server 102. Although a backend update isdescribed for exemplary purposes, it is noted that updates that passthrough the caching intermediary may also be detected through thehereinafter explained techniques, in addition to being directly detectedby the intermediary.

As understood by those of ordinary skill in the art, databases typicallymaintain portions of user data in local memory for performanceconsiderations. Databases may aggregate updates to the database over aperiod of time and write them all to disk at once in the background whenit has ample bandwidth and resources to do so. Databases typicallymaintain a log or other identification reflecting all updates to thedatabase in order to track the updates that have propagated to disk andthose that are currently only maintained in local memory. A query thatupdates one or more tables in database 120 can be processed, and theupdated results maintained in a memory local to the database such as asoftware memory or a fast volatile memory such as RAM. Subsequentlyreceived queries that reference this data before it is written to thedatabase tables in non-volatile memory are processed using the updateddata in local memory. Database 120 may accumulate numerous updates tothe database from various queries before writing updated data to thetables or other data structures stored on disk 126 or other non-volatilememory.

The log maintained by a database, often referred to as a transactionlog, is continuously updated and written to a non-volatile memory suchas disk 126. If power loss occurs, causing data maintained in localmemory at the database to be lost, the transaction log enables thedatabase to “replay” any lost transactions to recover data not writtento disk before the power loss. Upon reboot, the database can read thetransaction log, compare it with the data stored on disk, and if anyupdates reflected in the log have not yet propagated to disk, update theappropriate tables on disk in accordance with the information in thetransaction log.

In FIG. 5, a transaction log 152 is maintained by database 120 andstored on disk 126. Listener 128 can utilize this transaction log todetect all changes to database 120 regardless of origin. When database120 receives a query, such as query 166, specifying an update to one ormore tables, the database calls the operating system to writeinformation relating to the query to the transaction log 152 at disk126. Database 120 causes the transaction log to be updated, in additionto making the changes specified by the query in local memory. Listener128 can access the updated transaction log 152 on disk 126 to determineif changes were made to database data. When listener 128 detects anupdate to the database, it notifies caching intermediary 108 that anupdate has been made as shown by communication 188. The intermediary canupdate a modification table to reflect the time at which the tablescorresponding to any previously listed entries were updated, and to addnew entries and modification times for any updated tables not alreadylisted. In one embodiment where cached results are invalidatedimmediately upon detecting an update and a table such as table 208 isnot used, intermediary 108 can update a results table 204 byinvalidating or deleting those entries dependent upon an updated table.

FIG. 6 is a flowchart depicting one embodiment for updating a cachebased on detecting changes to data at the database. In FIG. 6, adatabase server-side component monitors changes to a databasetransaction log to detect and report changes to database data to acaching intermediary remote from the database server.

A transaction log or other indication of transactional updates to thedatabase is accessed at step 352. A server-side listener component 128reads the transaction log and at step 354 compares the most recent datafrom the transaction log with data read from a previous version of thetransaction log. If the listener detects changes to the transaction log,it makes a determination at step 356 that the database data has beenmodified. If no changes to the transaction log are detected, thelistener determines that the database data has not been modified. In oneembodiment, selected tables of the database are monitored while othersare ignored. Listener 128 can be configured to determine that a changeto monitored database data has occurred when a change to the transactionby reflecting an update to one of the monitored tables if detected. Inone embodiment, listener 128 detects all changes to the log and databasebut only reports changes to specified tables.

When an update to database data, or an update to selectively monitoreddata is detected, the listener formulates an update to be sent to thecaching intermediary at the application server. An update is sent to thecaching intermediary at the application server at step 358. Theintermediary will update its local cache results and tables at step 360to reflect the changes to the database. For example, an indication ofthe time a table was last modified as maintained in a modification table208 can be updated to reflect the time of this latest update. After thelocal cache is updated, the caching intermediary can optionally send areply at step 362 to the listener indicating that the necessary updateshave been completed. After receiving the success response at step 362 ordetermining that no monitored tables have been updated at step 356, thelistener will read the transaction log again at step 352 to determine ifmore updates have been made. In one embodiment, a delay is implementedbefore accessing the transaction log again at step 352. Additionally,the listener may not wait for a success response from the cache beforeaccessing the transaction log again after sending an update at step 358.

In one exemplary embodiment, listener 128 is configured as a debuggerfor a native facility or process associated with database 120. As adebugger for a process executed by the database, listener 128, via anoperating system 124, can gain access to and control operationsperformed by database 120.

Databases typically provide a facility or process that is responsiblefor writing data to transaction log 152 in accordance with updates tothe database. This type of facility is commonly referred to as a logwriter process for the database. By disguising itself as a debugger fora database's log writer process, listener 128 can be notified whentransaction log 152 is updated, access the transaction log orinformation stored therein, and notify the cache of any relevantupdates.

Referring again to FIG. 5, the exemplary embodiment presented thereinincludes a listener 128 configured as a debugger for a native log writerprocess 154 of database 120. Listener 128 attaches itself to thedatabase through log writer 154 in such a manner to be notified ofsuccessful write operations to disk 126 that update or createtransaction log 152.

Database 120 receives query 166, specifying an update to one or moretables of the database. The database initially updates any data it isstoring in local memory to reflect the changes made by query 166.Database 120 also updates any copy of transaction log 152 it ismaintaining in local memory to reflect the updates.

Log writer process 154 is responsible for updating and writingtransaction log 152 to disk 126 when updates to the database are made.Database 120, via logwriter process 154, sends a write command 182 toupdate the file for transaction log 152 stored on disk 126 in order toreflect the updates made by query 166. Write command 182 is received byoperating system 124, which forwards the write command 182 to disk 126,typically via a disk driver (file system) for the disk. When disk 126has successfully written the data for transaction log 152, it sends aresponse 184 to operating system 124 indicating that the data wassuccessfully written.

As a debugger for log writer process 154, listener 128 instructsoperating system 124 to notify it when updates are made to transactionlog 152 by log writer process 154. In FIG. 5, listener 128 is configuredto be notified when operating system 124 receives a write request fortransaction log 152 from log writer process 154. When operating system124 receives write request 182, it notifies listener 128 at 186 that awrite request for the transaction log has been received. Operatingsystem 124 passes the new or updated data written to transaction log 152in response to update 166 to listener 128, as indicated at 186. Inanother embodiment, the operating system passes the full transaction logto listener 128. Listener 128 parses the information received fromoperating system 124 to determine the nature of the database changes. Ifthe listener determines that the updates made by query 166 effect one ormore tables of database 120 that it is configured to monitor, anotification 188 is provided to caching intermediary 108 so that it caninvalidate any query results that are dependent on the updated table(s).In one embodiment, the intermediary updates information maintainedlocally to indicate the time at which the tables were modified, bymodifying or creating new entries in a modification table such as table208. In one embodiment, listener 128 is configured to be notified when awrite operation for transaction log 152 has been successfully completed.When operating system 124 receives the success response from disk 126,it can notify listener 128 at 186 that a successful write operation tothe transaction log has been completed.

When operating system 124 receives success response 184 from disk 126,it will also reply to database 120 with success response 190 informingit that the transaction log was successfully updated. Log writer process154 receives response 190 and can issue a response to the requestingapplication that its update was successfully processed.

It is possible in a configuration like that of FIG. 5 that an updatefrom listener 128 to intermediary 108 may not be processed in sufficienttime to avoid returning old or no longer valid data to requestingapplications. Typically, databases will temporarily stop processingupdates to database data while the transaction log is writing the log todisk. Databases my further refrain from responding to read queries withupdated data until that data has been successfully written to the logfile maintained in non-volatile memory. This avoids data inconsistenciesshould the data in local memory be lost after it has been provided inresponse to a read request but before it is written to non-volatilememory. Log writer process 154 can pause database 120 from further writeand/or read query processing while it is writing the transaction log.When success response 190 is received, log writer process 154 knows theupdated data for the database is now maintained in non-volatile memory.Accordingly, the log writer process will free the database 120 tocontinue processing queries.

Consider a situation where listener 128 and database 120 receiveresponses 186 and 190, respectively, at the same time. Log writerprocess 154 will process success response 190 and then release thedatabase to use and respond with the newly updated data as well asprocess new queries. When intermediary 108 receives update 188 fromlistener 128, it updates its cached data as required to invalidate anyeffected result entries in results table 204. If the intermediaryupdates its records after database 120 is released to begin respondingand using the new data from query 166, intermediary 108 may return olddata in response to a query. Imagine that a read query is received byintermediary 108 after database 120 is released to use and respond withthe new data from query 166, but before the intermediary updates itsrecords to reflect that the relevant old entries are no longer valid.The intermediary may return the cached results from its local memory tothe requesting application since it is unaware of the changes made tothe database by query 150.

FIG. 7 is a flowchart of a method in accordance with one embodiment fordetecting changes to a database using a listener configured as adebugger for a native database facility responsible for updatingtransactional information to document changes to the database. In theembodiment of FIG. 7, a technique is implemented to ensure that acaching intermediary is updated in response to changes to the databasebefore the database is released to begin using and responding torequests with the new data. The listener presents itself as a debuggerto the operating system at the database server. As understood by thoseof ordinary skill, debuggers can gain access and control over selectportions of code, applications, or entire systems. By implementing thelistener as a debugger for a portion of the database, the listener cantake control of that portion to receive information regarding updates atthe database from the database itself. Because the listener is disguisedas a debugger, the operating system is instructed to notify the listenerof select actions and allow the listener to take control of selectactions to receive this updated information. The listener can thus takecontrol of a portion of the database to ensure that the cachingintermediary is updated before the database begins using updated data.While such an implementation is broadly and generally described as adebugger, it will be understood that the listener can gain access andcontrol of portions of the database using other suitable techniques,such as through various instrumentations of the database source orobject code.

The database receives an update to data stored in the database (e.g.,write query) at step 370. The database updates its local memory at step372 to reflect the changes made by the write query. The database updatesits transaction log in local memory at step 374 to reflect the updatesmade by the query. The database sends a command to the operating systemat step 376 to write the new information to the transaction log storedon a local non-volatile storage medium disk. In one embodiment, a logwriter process is responsible for writing the transaction log to diskand issues a write command to the operating system at step 376.

At step 378, the operating system issues a write command to update thetransaction log at the disk responsible for storing the transaction log.The write command, which includes the new information from the query, isreceived at the listener component at step 378. The listener componentis configured as a debugger for the log writer process so that theoperating system forwards the write request to it for processing. Thelistener presents itself as a debugger in order to gain access andcontrol of the log writer process to receive the transaction logupdates. The listener forwards the write request to the disk at step 382and receives a response that the data was successfully written at step380.

The listener is further configured to pause the log writer process atstep 834 when a success response is received from the disk indicating anupdate to the transaction log. In one embodiment, listener 128 pausesthe log writer process by instructing the operating system not to returnthe success response to the database. The listener temporarily pausesthe log writer process to insure that the caching intermediary at theapplication server has updated its local cache results updated inaccordance with the write or update query. Database 120 will not use thenew data provided by the query until it receives a notification that thetransaction log has successfully been written to disk.

The listener parses the information received from the operating systemat step 386 after the log writer process has been paused. The listenerdetermines the tables modified by the query. If tables the listener wasconfigured to monitor were updated, an update is sent at step 388 to thecaching intermediary at the application server. At step 390, the cachingintermediary updates its locally stored information (e.g., modificationtable 208) to reflect the time at which the tables were modified by thequery.

After the caching intermediary successfully updates its data, itprovides a success response to the listener at step 392. Upon receipt ofthe response, the listener instructs the operating system at step 394 torelease the log writer process. At step 396, a success responseindicating that the transaction log was successfully written is issuedto the log writer process. The log writer process releases database 120to use the new data from the query at step 398. In this manner, thedatabase will only use new data after that data has propagated to thecaching intermediary. Thus, each cache can remain in a consistent andaccurate data state with respect to the actual database.

In one embodiment, the listener is configured to be notified of updatesto the transaction log before they are written to disk. For example, thelistener component can be notified of updates by the operating systemwhen a write command is issued to the disk to update the transactionlog. If updates are made to tables which the listener is monitoring, thelistener begins updating the caches at the application level prior tothe disk handling the write command. It is possible that the transactionlog will not be successfully written to disk and thus, the database notupdated with the new data. The caching intermediary may invalidatecached results that are still valid as a result. While invalidations mayoccur more often than necessary in this scenario, no data consistencyissues will exist. The intermediary will simply provide queries to thedatabase for which it was maintaining valid results in its cache.

The listener is not limited to the particular structure of any disclosedarrangement and can be tailored to the requirements of individualimplementations. In one embodiment, the listener is configured as a filesystem driver for the disk on which the database transaction log is tobe stored. After the database receives an update to the database data,it issues a command to the operating system to write the new informationto the transaction log on disk. The operating system issues this writecommand to an interface for the disk adapted to translate internaladdresses used by the operating system to addresses understood by thedisk. The listener can disguise itself as the disk interface or driverfor the disk storing the transaction log in order to intercept commandsissued to the disk via the disk interface.

The listener will receive the command and forward it to the actual filesystem driver for the disk. The file system will pass the command to thedisk and receive a success response in return when the data issuccessfully written. The listener can be configured to access thetransaction log or the data passed in the write command to be written tothe transaction log to defect updates to database data. In oneembodiment, the listener accesses the data written to the transactionlog when the write command is issued to the file system for the disk. Inanother embodiment, the listener accesses this information when asuccess response is received from the disk after the data has beenwritten. In either case, the listener parses the information it receivesand provides an update to the caching intermediary if necessary.

Similarly to the debugger configuration, the listener can pass thesuccess response received from the disk to the operating systemimmediately upon receipt. The operating system will send a successresponse to the database, freeing the database to use the uploadinformation. This scenario can lead to data inconsistencies between thecache and database as previously described.

In another embodiment, the listener pauses the log writer process afterreceiving the success response from disk. The listener will wait untilit receives a success response from the caching intermediary beforeissuing the success response to the operating system. By pausing the logwriter process until the cache is updated, the listener ensures dataconsistency as already described.

Caching Stored Procedures and Invalidating Results

Stored procedures represent an increased level of complexity for cachingsystems associated with databases. Stored procedures define a set of oneor more operations and are stored at a database in a compiled format tobe called by requesting applications to receive the results of theoperations. Stored procedures typically include a set of SQL statements.Applications or developers wishing to access a stored procedure simplyneed to know the stored procedure's name, rather than schema, indexing,and column information required to develop a query. In addition to SQLstatements, stored procedures can include operational code. Storedprocedures can execute a set of code, call other databases to access ormodify records, directly access other files, and/or access and transmitother information, such as by calling other applications. Storedprocedures are valuable, if not essential, components of many databaseinstallations. They provide centralized query management forinstitutions, grouping operations and call commands for increasedsecurity and to avoid errors, etc. when writing individual queries andapplication code. Stored procedures improve database performance byreducing the total traffic between applications and the database. Anapplication can call a stored procedure once to execute numerous queriesand receive a single set of results, rather than issue each individualquery and receive a result for each issued query. The pre-compilednature of stored procedures provides significant improvements inperformance when compared to traditional query statements. Storedprocedures are compiled just once by the database which reuses thecompiled process whenever the procedure is called. Queries on the otherhand, are compiled each time the query is executed.

Caching techniques that rely on parsing queries received from requestingapplications are unable to handle stored procedures. A stored procedureis only definable and understood by the database on which it is stored.For example, a typical stored procedure as seen by an application orapplication server will simply be a textual invocation of the proceduresuch as foo( ). Any application or caching system interfacing with thedatabase does not know the internal operations defined by foo( ), andthus, can only see this textual call or issue similar calls to accessthe procedure. Parsing such a call to a stored procedure will not revealanything about the individual queries the stored procedure references orthe commands it executes at the database.

In order to run stored procedures, databases typically perform ananalysis before run-time to assess how the stored procedure isprocessed. For example, the database may compile the stored procedure,examining the structure of the procedure, the individual queries thedatabase tables accessed, in what manner and order the tables areaccessed, as well as the various other operations that the database mayperform as part of the stored procedure. The database may maintain thisinformation or compiled process locally to increase performance when arequest for this stored procedure is received. In this manner, thedatabase does not have to determine this information each time thestored procedure is called. Conceptually, the preparatory work by thedatabase for a stored procedure amounts to compiling the storedprocedure for run-time execution.

In accordance with one embodiment of the technology described herein,stored procedures are successfully cached and the results invalidatedbased on changed data in the database. FIG. 8 depicts one embodiment forcaching stored procedures. Application 104 issues a request 402 for astored procedure 403 maintained at database 120.

Application 104 (e.g., Application A) can issue a call 402 to a storedprocedure 403 and pass one or more variable values. Intermediary 108 canintercept the request as previously described for simple queries bywrapping itself around the native database driver 106. Intermediary 108opens a trace session at the database when it determines the request isfor a stored procedure. As understood by those of ordinary skill in theart, trace sessions are typically used and designed for debugging. Atracing process (e.g., 161) can collect various types of informationassociated with an application, system, or section of code during aperiod of time during which the trace is open. For example, a tracingfacility provided by an application can allow a user of the applicationto observe files accessed, files modified, new files created,applications called, methods called, and various other types ofoperations associated with the application. Various facilities providedby databases to track and document operations the database performsduring a period of time may commonly be referred to as tracingfacilities. An Oracle™ database application provides a facility called“trace.” Other database vendors and types have similar but differentlynamed facilities to perform these processes.

After opening the trace session with database 120, the intermediaryissues the stored procedure call 402 to the database. Database 120receives call 402 and executes stored procedure 403 by performing thevarious functions and queries specified by the stored procedure.Database 120 returns the results 406 of the stored procedure tointermediary 108 when it is complete. The results 406 of storedprocedure 403 are cached in a results table such as table 204 at theintermediary in the same manner query results are cached. Anidentification of stored procedure 403, including the literalexpressions passed with stored procedure call 402, are maintained alongwith the results 406 received from the database 120.

When intermediary 108 receives the results, indicating that the storedprocedure has been executed, it also closes the trace session withdatabase 120. When the trace session is closed, database 120 generates aresults file 409, which in the embodiment of FIG. 8 is stored at localdisk 206 in response to write command 408. The database may create andupdate the trace file while the trace session is opened in otherembodiments. Different databases record the results of trace sessions indifferent ways. Databases may create a trace file containing informationabout the actions taken while the trace session was opened. Otherdatabases may return information directly to the application opening thetrace session. The technology described herein is not limited to anyparticular type of database or native facility, and it is anticipatedthat the particular facility used may vary according to the requirementsof a particular implementation.

Intermediary 108 accesses the trace information from file 409 todetermine the queries executed at the database when performing storedprocedure 403. In the embodiment of FIG. 8, intermediary 108 issues arequest 410 to listener 128 to read trace file 409 from disk 126.Listener 128 responds by issuing a read request 412 to operating system124. The request is forwarded to the disk which returns the file to theoperating system as indicated at 414. The operating system forwards thefile to listener 128. Listener 128 parses the information in the tracefile to determine the individual queries that users executed to performstored procedure 403. Listener 128 can select the queries executed inresponse to the stored procedure by identifying those queries from thetrace file that include an identification associated with the cachingintermediary. Trace files may contain information identifying alloperations, queries, etc. a database performs while a trace session isopen, regardless of the originating application to which the databasewas responding. Databases typically store an identification of theapplication or process initiating each query in the trace file which thelistener can use to select only those queries executed for procedure403.

A list 416 of queries executed in response to the stored procedure issent from listener 128 to intermediary 108 after the trace file has beenparsed. From this list, intermediary 108 determines every query executedat the database in response to stored procedure 403. The intermediarydetermines each query executed by a stored procedure without parsing thestored procedure or the call issued by application 104.

Having determined the queries executed in response to stored procedure403, intermediary 108 determines which, if any, tables at the databasewere modified by the stored procedure. This determination is made aspreviously described for a standard query received at the cache.Intermediary 108 abstracts each individual query and passes it to thedatabase by calling a native database analysis facility to analyze howit processes a query of that structure. The abstracted query 418 ispassed to database 120 which analyzes it and returns results 420 tointerface 112 identifying any modified tables.

Intermediary 108 creates or modifies any entries in tables 206 and 208associated with portions of the database affected by stored procedure403. An entry is created (if not already existing) in table 206 for thestructure of stored procedure 403, along with the tables the storedprocedure is dependent upon. The intermediary can abstract a storedprocedure in the same manner as a query by removing any literalexpressions or variable data from the stored procedure call. Interface112 can list the abstracted stored procedure call in cache 110 with anindication of the tables the stored procedure is dependent on. Thedependencies are determined from individual query abstraction analyzedby the database. For stored procedures, table 206 can further include anindication of the tables the stored procedure modifies. Table 206 caninclude an indication for each listed table whether the proceduredepends on that tables or modifies that table. In table 208,intermediary 108 lists any tables the stored procedure modifies with anindication of the last time that table was modified. For anypre-existing entries, the intermediary updates the indication of thelast time of modification.

FIG. 9 is a flowchart depicting a method in accordance with oneembodiment for caching database stored procedures. A request or call fora stored procedure maintained at a database is received at step 702. Atstep 704, a results table containing previously received results forvarious stored procedures and/or queries is accessed to determinewhether an entry is present for the stored procedure referenced by thecall. If an entry does not exist for the particular stored procedure,the stored procedure is abstracted at step 706 and a dependency tablesuch as table 206 is accessed to determine if an entry is present forstored procedures having the structure of the received query at step708. If an entry for the query's structure is not present in thedependency table, a trace session is opened at the database at step 710.A request can be issued to the database to track the operations,transaction, queries, etc. processed while the trace session is open.After opening the trace session, the stored procedure request isforwarded to the database at step 712. The database executes the storedprocedure at step 714 by performing the actions specified in the storedprocedure using the data passed with the request. The database tracesits actions performed while executing the stored procedure at step 716and creates a file or other indication of the trace analysis. Afterexecuting the stored procedure, the database returns the results at step718. The caching intermediary can receive the results and store them ina results table at step 720. An indication of the stored procedure alongwith the results of its execution can be stored in a results table suchas 204. The results are returned to the requesting application at step722. After caching the results of the stored procedure, the tracesession at the database is closed at step 724. In one embodiment, thetrace session is closed when the results from the database are received.

The caching intermediary can issue a request to access the traceinformation at step 725 if it is not automatically returned aspreviously described. For example, a request can be issued to a listenerinstalled at the database machine to read a file such as trace file 409storing the trace data. The listener can issue a read command and parsethe information received from the requested trace file. The listener canreturn all or a portion of this data to the caching system at step 726.In one embodiment, the listener parses the data and returns a list ofthe queries executed by the database for the stored procedure. In otherembodiments, the listener can pass the entire trace file to theintermediary which can perform the proper analysis to determine thequeries. IN one embodiment, the trace information is automaticallyreturned to the requesting application such that steps 725-726 can beskipped. In any event, the queries that are executed as part of thestored procedure are determined at step 727.

Each query executed as part of the stored procedure is abstracted atstep 728 by removing any literal expressions from the query. Eachabstracted query or skeleton is passed to the database at step 730 bycalling a native facility of the database to analyze the structure ofeach query. The results for each query skeleton are accessed at step732. The intermediary creates an entry in a dependency table such astable 206 at step 734. Each entry will include a stored procedureskeleton and the tables it is dependent upon. In one embodiment, anytables modified by the stored procedure are also listed in thecorresponding entry. The table can include an indication of whether alisted table is a dependency of the query or a table modified by thequery. At step 736, entries in a dependency modification table such astable 208 are created or updated for any tables modified by the storedprocedure. For each table that was updated in response to the storedprocedure command, an existing entry in the modification table can beupdated with the time the stored procedure modified the correspondingdatabase table if an entry for the table already exists. If an entry forthe table does not already exist, a new entry is created by listing thetable and time of modification. The results table, dependency table, andmodification table can be used for future stored procedure requests todetermine whether results at the cache are validated and can be returnedinstead of retrieving them from the database itself.

If it is determined at step 708 that the structure of the storedprocedure is already cached in a table such as 206, for example, thedependencies of the stored procedure structure are determined at step756. A call to the stored procedure is issued at step 758 and theresults received. At step 760, the results are cached by theintermediary in the results table. The results are returned to therequesting application at step 762. The modification table 208 isupdated at step 764 if the stored procedure includes write queries.

If it is determined at step 704 that an entry in the results tablealready exists for the stored procedure, the dependency table isaccessed at step 736 to determine the tables the stored procedure isdependent upon. The stored procedure can be abstracted to determine ifan entry in the dependency table exists for queries of the receivedquery's structure. The time of last modification of each dependent tableis determined at step 742. At step 744, it is determined whether thecached results are still valid. In one embodiment, a time stamp for theresults determined from the results table is compared with the timeslisted in the modification table for each table the stored procedure isdependent upon. If the results in the results table were retrievedbefore an update to one or more of the tables modified by the storedprocedure as indicated in the modification table, the cached results arenot valid and are not returned to the requesting application. A call tothe stored procedure at the database is then issued at step 746 and theresults received. The results of the stored procedure are stored in thecache at step 748 with an indication of the time they were obtained. Theresults are returned to the requesting application at step 750. Themodification table is updated at step 752 to reflect the time of thelatest updates if the stored procedure modifies any tables.

If the cached results are determined at step 744 to be valid, theresults are returned to the requesting application at step 754. Asmentioned, stored procedures can include write queries as well as readqueries. In one embodiment, a call to a stored procedure is issued tothe database even when the results of the procedure maintained by thecache are valid and can be returned to the requesting application. Thestored procedure is issued so that the updates to the database by thewrite portions of the procedure can be made. The cached results canstill be returned to increase performance in responding to applications.When the results of passing the stored procedure to the database arereturned, the caching system can update the modification table toreflect the time of the new updates. Additionally, the results can beused to update the results table with new results for the storedprocedure and the time these results were retrieved.

In one embodiment, the results table and/or dependency table can includean indication for each stored procedure entry whether that storedprocedure modifies any data at the database. An embodiment thatspecifies whether each table in the dependency table is updated by thecorresponding stored procedure or whether the stored procedure modifiesthe listed table has already been described. When the intermediarydetermines that a requested procedure contains a valid entry in theresults table and the procedure does not modify data at the database,the cached results are returned and the stored procedure is not passedto the database. If the entry indicates that the stored procedure doesmodify data, it can be passed to the database to modify database dataafter returning the cached results to the requesting application. In yetanother embodiment, a caching system is implemented to only cache storedprocedure results and other information for procedures that do notmodify data at the database.

Row-Level Caching and Invalidation

The embodiments described thus far have described table based cachingsystems that invalidate entries when a dependent table of a query ismodified. The disclosed techniques are extendable to other levels ofinvalidation based on more refined analysis of modified data. Forexample, the disclosed technology can be used to invalidate cached queryresults when a row the query is dependent upon is modified. Likewise,results can be invalidated only when an actual entry in the database thequery is dependent upon is modified. These techniques can increaseefficiency and performance by further reducing the number of times validquery results in the cache are invalidated.

FIG. 10 is a flowchart depicting a row-based caching and invalidationtechnique in accordance with one embodiment. In FIG. 10, row basedcaching is implemented for queries whose dependencies are limited to theretrieval of records (row of data) based on a unique identifier for therow. For example, if a query specifies “select * from employee whereuser_id=1111”, and user_id is a unique user_id assigned to eachindividual in the organization, such a query can be cached andinvalidated in accordance with the disclosed techniques. An update tothe individual record (row) of the employee table specified in the querycan be the basis for invalidating the cached query results. If an updateis made to another record (row) in the employee table, the cachedresults are not invalidated since the update does not affect the cachedresults.

For queries that depend on something more than a unique recordidentifier, the table-based caching techniques previously described areemployed. Consider an exemplary query that specifies, “select * fromemployee where salary >100000.” The results of this query depend upon acolumn in the employee table other than the unique record identifier forthat row (e.g., user_id or social security number). The query is notmerely dependent upon a part of the entry that is unique for eachindividual record, but rather the salary of an employee which is anon-unique portion of a record. If the results for this query are cachedand a subsequent update increases the salary of an employee that waspreviously below $100,000 to above $100,000, the cached results will nolonger be valid. The employee record with the salary column updated to avalue above $100,000 should now be included in the query results.However, the update does not affect any rows that were returned inresponse to the original query. If invalidation is based only on changesto those records or rows included in the original results, row-basedcaching in this example would fail to properly invalidate the cachedresults and invalid results returned for a subsequent received versionof this query.

In FIG. 10, selective row-based caching is implemented to guaranteeaccurate cache results while also improving performance by reducing thenumber of invalidations inherent in a table-based invalidationimplementation. A query is received from a requesting application atstep 802 and at 804, it is determined whether an entry corresponding tothe received query exists in a results table. If there is nocorresponding entry in the results table, the query is abstracted atstep 806 and a determination made at step 808 as to whether an entry forthe abstracted query skeleton exists in a dependency table.

If no entry exists for the query skeleton, the resulting skeleton ispassed to the database at step 810 by calling a native analysis facilityof the database to assess how the query structure is handled. Theresults of the analysis are accessed at step 812. At step 814, an entryis created in the dependency table for the query skeleton. Dependencytables in the row-based technique depicted in FIG. 10 are implemented aspreviously described with respect to table-based caching. The entrylists the abstracted query skeleton and any tables queries having thatstructure are dependent upon.

At step 816, it is determined whether the query requests records basedsolely on a unique identifier for individual records of thecorresponding table. If the query depends solely on a unique identifierfor a row of the relevant table, the process continues at step 818 whereit is determined if the query is a write or read query. If the query isa read query, the query is modified at step 820 to include a request forthe row ID number of the record requested by the query. Each record in atable includes an internal identifier used by the database to uniquelyidentify every record. This identifier or row ID number is differentthan the unique user data identifier referred to previously.

The modified query is passed to the database at step 822. The results ofthe query are received from the database at step 824. The query resultsare used to create an entry in results table 204 at step 826. The entrycreated in table 204 will not only include the query, the results, andthe time the results were obtained, but also the row ID numberassociated with that query that was retrieved via the modified query.The row ID number is used to determine whether cached query results arevalid when subsequent requests including the cached query are received.The query results are returned to the requesting application at step828. If the cached system determines at step 818 that the query is awrite query, a modification table is updated at step 830 to indicate themodifications to the tables effected by the write query. To facilitatethe row-based caching technique of FIG. 10, the modification tablestores an indication of the times rows within a table were last updated.At step 830, the modification table is updated to include the tabledependencies determined at step 812. For each entry, the modificationwill specify the row_id for the specific record(s) modified by the queryas well as the table that was modified and the time of modification. Ifit is determined at step 816 that the query does not requests recordsbased solely on a unique identifier for individual records of thecorresponding table, the method continues at step 832 where table basedcaching is instituted according to steps 320-332 of FIG. 4. Steps 322and 328 are skipped if the query's structure is already cached.

If it is determined at step 804 that an entry exists in results table204 for the query received at step 802, the dependencies for the queryare determined at step 834. The tables the query is dependent upon aredetermined from dependency table 206. The row_id number of the query isdetermined from results table 204 at step 836. Modification table 208 isthen searched at step 838 for an entry matching the dependent table ofthe query and the row_id number of the query. If an entry is found andthe time of last modification to the specified row was before theresults stored in table 204 were obtained, the cached results aredetermined to be valid at step 840. The cached results are returned tothe requesting application at step 842. If no entry is found, it isdetermined that the cached results are valid and they are returned atstep 842. If an entry is found and it indicates an update to the rowafter the results were obtained, the query is passed to the database atstep 844 and the results received. The results are provided to therequesting application at step 846.

It should be noted that row-based caching and table-based caching can becontemporaneously implemented in accordance with one embodiment. Forexample, table 208 may only list table identifications for certainentries along with the time the table was modified when the entry is fora query that is dependent on something other than a unique recordidentifier. Table 208 may also include entries including row ID numbersand table IDs for other entries. Likewise, entries in table 204 for thequery results may list the row_id numbers for queries meeting the uniquerow identifier dependency limitation. For more complicated queries withadditional dependencies, only the results and time will be maintained.In this manner, when a query is received that has an entry in table 204,an intermediary can determine the row ID dependency from table 206 inorder to compare that dependency with the time of last update from table208 to determine whether the results in table 204 are valid.

In one embodiment, a caching system can be distributed across one ormore servers or other systems at various levels to provide furtherperformance and availability. FIG. 11 depicts an exemplary embodimentincluding caching intermediaries 108 distributed at application servers102 as well as a central server 902. Central server 902 is implementedbetween application servers 102 and database server 122 to provide anadditional level of caching.

Each application server includes a caching intermediary 108 aspreviously described. Central server 902 also includes a cachingintermediary 108, providing a centralized cache for results andinvalidation information that can be used by the individualintermediaries at each application server. In FIG. 11, the cachingintermediary 108 at the central server includes similar cache 110 andinterface 112 components as the local caching intermediaries. Thesecomponents may include additional operations and perform slightlydifferent operations in such a multi-level environment. Again, differentimplementations of intermediary 108 can be made in accordance withembodiments. In FIG. 11, the listener component 128 interfaces with thecaching intermediary 108 at the central server 902 to provideinformation regarding updates to the database. The intermediary at thecentral server can maintain this information for invalidating cacheentries. In one embodiment, the central server intermediary can passthis information to the local intermediaries at the application serversso it can be used there for invalidations. In yet another embodiment,the listener component 128 can interface directly with each cachingintermediary 108 at the application server level and central serverlevel.

In an exemplary operation of the system of FIG. 11, an application 104at any of the application servers can issue a query or stored procedurecall which is intercepted at the local caching intermediary 108associated with that application. The caching intermediary can determineif it is storing dependency information for the structure of thereceived query. If the structure is stored locally, the localintermediary can determine if it is storing valid results for thereceived query as previously described. If the local intermediary isstoring valid results, it can further access the central server'sintermediary 108 to assess the validity of the results.

An application at one of the other application servers in communicationwith the central server may update database 120 in a transaction notseen at the local intermediary processing the current request. However,a query received from any of the local application servers will passthrough central server 902. Accordingly, the local intermediary canaccess the central server to determine the last time any table, row,etc. upon which the received query is dependent was last modified by anyof the applications at any of the application servers. If the localintermediary determines that its local results are valid, those resultscan be returned to the requesting application. If the local results arenot valid, the local intermediary passes the query to the cachingintermediary at the central server 902. The caching intermediary atcentral server 902 will return its locally cached results if they arevalid.

If neither the local intermediary nor the central intermediary aremaintaining valid results for the query, the intermediary at the centralserver will forward the query to the database and receive results. Theintermediary at the central server will update its cache with the newresults, time of retrieval, and/or any portions of the database modifiedby the query if necessary. The results will be returned to the localintermediary which will also update its cache in accordance with the newresults, time, etc.

In one embodiment, the central server's caching intermediary can alsosend the results and any update information to the local intermediariesat the application servers in response to updates to the database fromany of the application servers and/or in response to a direct updatedetected by listener component 128. In this manner, the localintermediaries need only check their local caches to determine if theircached results are still valid.

If the local intermediary receiving the query initially determines thatit is not storing dependency information for the structure of thereceived query, it can abstract the query and pass it to theintermediary at the central server to determine if the intermediary atthe central server is maintaining the dependency information. Forexample, dependency analysis may have already been performed by thedatabase for the received structure in response to a query from anotherapplication server. If the intermediary at the central server ismaintaining the information, it can pass it to the local intermediarywhich can cache the information for later use. Upon receiving thedependency information in one embodiment, the local intermediarydetermines that it is not storing valid results for the query andforwards the query to the central server's intermediary. In otherembodiments, the local intermediary can pass the query to the centralserver's intermediary as soon as it determines that it is not storingthe dependency information, rather than wait to receive thisinformation.

If the database is not storing the dependency information for a query ofthis structure, it will send a response to the local intermediaryinforming it that the query structure has not yet been analyzed or thatqueries of that structure are not cacheable. In one embodiment, thecentral server intermediary stores information to indicate querystructures that were previously determined to not be cacheable to avoidthe performance of dependency analysis more than once for the samestructure. Additionally, the local intermediaries can maintaininformation about query structures that were previously determined tonot be cacheable so that a request to the central server can be avoidedwhen it is known that queries of the received structure are notcacheable.

After receiving a notification from the central server intermediary thatthe dependency information is not being stored or that queries of thereceived query's structure are not cacheable, the local intermediarysends the actual query to the intermediary at the central server. Thelocal intermediary also sends the actual query to the intermediary afterreceiving dependency information and determining that it is notmaintaining valid results for the actual query as mentioned above. Thecentral server intermediary can again check whether it is storingdependency information for the received query. In other embodiments, thecentral server intermediary only performs this check once. The centralserver intermediary will pass the abstracted query to the database bycalling a facility such as “EXPLAIN PLAN” as previously described. Theintermediary at the central server will also pass the actual query tothe database. When the actual query results are received, theintermediary at the central server can cache the results and forwardthem to the local caching intermediary which will also cache theresults. Likewise, when the results of the dependency analysis arereceived, the central server intermediary will cache the dependencyinformation, etc. and forward the results to the local intermediarywhich will cache them as well. In one embodiment, the central serverintermediary and local intermediaries wait to receive the dependencyanalysis information before caching query results or dependencyinformation.

In FIG. 11, a dual-level caching system is provided with caches at theapplication server level and another server level implemented betweenthe application servers and database server. In one embodiment,additional levels of caching can be used. For example, applicationserver caching systems may interface with a regional server running acaching system in accordance with an embodiment. Multiple regionalservers may interface with a central server running a caching systemthat interfaces with the database system. Other variations withadditional levels of caching or further distribution of componentsacross these servers can be implemented in accordance with variousembodiments.

While exemplary embodiments have been described in which caching systemsor intermediaries execute on application servers and other servers, anytype of processing or computing device may be used, including personalcomputers, minicomputers, mainframes, handheld computing devices, mobilecomputing devices, and so forth. Typically, these computing devices willinclude one or more processors in communication with one or moreprocessor readable storage devices, communication interfaces, peripheraldevices, and so forth. Examples of storage devices include RAM, ROM,hard disk drives, floppy disk drives, CD ROMS, DVDs, flash memory, andso forth. Examples of peripherals include printers, monitors, keyboards,pointing devices, and so forth. Examples of communication interfacesinclude network cards, modems, wireless transmitters/receivers, and soforth. In some embodiments, all or part of the functionality isimplemented in software, including firmware and/or micro code, that isstored on one or more processor readable storage devices and is used toprogram one or more processors to achieve the functionality describedherein.

The foregoing detailed description has been presented for purposes ofillustration and description. It is not intended to be exhaustive or tolimit the disclosure and any claimed subject matter to the precise formdisclosed. Many modifications and variations are possible in light ofthe above teachings. The described embodiments were chosen in order tobest explain the principles of the technology and its practicalapplication to thereby enable others skilled in the art to best utilizethe technology in various embodiments and with various modifications asare suited to the particular use contemplated. It is intended that thescope of the present disclosure and its claimed subject matter bedefined by the claims appended hereto.

1. A method of caching database query results, comprising: receiving arequest for information from a database; determining if previouslyreceived results for said request are maintained locally; if previouslyreceived results are not maintained locally for said request: passing aform of said request to said database by calling a native analysisfacility at said database to assess how said database processes requestsof said form; accessing results of said assessment by said nativeanalysis facility; determining from said results one or moredependencies of said database for requests of said form; maintainingdata indicating that requests of said form for said database includesaid one or more dependencies; passing said request to said database andreceiving results; and maintaining information including said requestand said results of said request.
 2. The method of claim 1, wherein:maintaining said information includes maintaining an indication of atime said results were determined.
 3. The method of claim 2, furthercomprising: if previously received results are being maintained for saidrequest: accessing said data to determine said one or more dependencies;accessing information indicating a time of last update to said one ormore dependencies; determining whether said previously received resultsare valid by comparing said indication of said time said results weredetermined with said time of last update to said one or moredependencies; returning said results in response to said request if saidpreviously received results are valid.
 4. The method of claim 3, furthercomprising: if said previously received results are determined to beinvalid: passing said request to said database and receiving updatedresults; updating said information maintained for said request toindicate said updated results.
 5. The method of claim 1, wherein: saidnative analysis facility is “EXPLAIN PLAN.”
 6. The method of claim 1,wherein: said native analysis facility is a debugging facility providedby said database.
 7. The method of claim 1, wherein: said nativeanalysis facility is a performance analysis facility provided by saiddatabase.
 8. The method of claim 1, wherein: said one or moredependencies of said database for requests of said form are one or moretables of said database upon which requests of said form depend forresults.
 9. The method of claim 1, wherein: said one or moredependencies of said database for requests of said form are one or morerows of one or more tables of said database upon which requests of saidform depend for results.
 10. The method of claim 1, further comprising,if previously received results are not being maintained for saidrequest: abstracting said request by removing any literal expressionsincluded therein to generate said form of said request; wherein saidrequest is a database query and said form of said request is a skeletonof said database query.
 11. The method of claim 1, wherein: receivingsaid request includes receiving said request at a first caching system;passing said form of said request includes passing said form of saidrequest from a second caching system to said database after receiving atleast one of said request and said request at said second caching systemfrom said first caching system; and maintaining said data andmaintaining said information include maintaining said data and saidinformation at said second caching system.
 12. The method of claim 11,wherein: maintaining said data and maintaining said information includemaintaining said data and information at said first caching system andsaid second caching system.
 13. A database caching system, comprising: adatabase including a native analysis facility to analyze querystructures; a caching intermediary in communication with said databaseand an application accessing said database, said caching intermediarymaintains one or more queries and results received from said database inresponse to said one or more queries, said caching intermediary receivesa first query from said application and determines if said first queryand previously received results for said first query are maintained bysaid caching intermediary, said caching intermediary passes a form ofsaid first query to said native analysis facility at said database ifsaid caching intermediary is not maintaining results for said firstquery and accesses results of said analysis facility to determine one ormore tables of said database upon which queries of said form depend,said caching intermediary further maintains said form of said firstquery and an indication of said one of more tables of said database uponwhich queries of said form depend.
 14. The database caching system ofclaim 13, wherein: said caching intermediary maintains an indication ofdependencies from said database and a time of last modification of eachof said dependencies; said caching intermediary receives a second queryfrom said application and after determining that said second query isnot maintained at said caching intermediary, passes a form of saidsecond query to said native analysis facility and accesses results ofsaid analysis facility to determine if said second query reads ormodifies data at said database, said caching intermediary determinesfrom said results one or more tables of said database modified byqueries of said form of said second query if said second query updatesdata at said database, said caching intermediary passes said secondquery to said database and receives results in return, said cachingintermediary maintains said second query with an indication of saidresults of said second query and also maintains an indication of saidone or more tables of said database modified by queries of said form ofsaid second query with an indication of a time of most recentmodification to said one or more tables.
 15. The database caching systemof claim 14, wherein: said caching intermediary maintains said form ofsaid second query with an indication of said one or more tables modifiedby queries of said form of said second query.
 16. The database cachingsystem of claim 13, wherein: said form of said first query is astructure of said first query generated by removing any literalexpressions from said first query.
 17. The database caching system ofclaim 13, wherein: said caching intermediary includes a first cachingintermediary at a first processing system and a second cachingintermediary at a second processing system.
 18. The database cachingsystem of claim 17, wherein: said caching intermediary further includesone or more additional caching intermediaries; said second cachingintermediary receives queries received at said first cachingintermediary and said one or more additional caching intermediaries. 19.One or more processor readable storage devices having processor readablecode embodied on said one or more processor readable storage devices,said processor readable code for programming one or more processors toperform a method comprising: receiving a request for information from adatabase; determining if previously received results for said requestare maintained locally; and if previously received results are notmaintained locally for said request: passing a form of said request tosaid database by calling a native analysis facility at said database toassess how said database processes requests of said form; accessingresults of said assessment by said native analysis facility; determiningfrom said results one or more dependencies of said database for requestsof said form; maintaining data indicating that requests of said form forsaid database include said one or more dependencies; passing saidrequest to said database and receiving results; and maintaininginformation including said request and said results of said request.